smp: fix proxy reconnection to relay after restart#1806
Open
shumvgolove wants to merge 3 commits into
Open
Conversation
Reproduces the proxy failing to reconnect to a destination relay when the sender disconnects mid-connection (empty session var left in smpClients).
getSessVar inserts an empty session var that the connect path then fills with putTMVar. If the connecting thread is killed by an async exception before that fill (a proxy worker on client disconnect, an agent worker on cancel), the empty var was left in the map forever and every later request for that server blocked on it until timing out (permanent PCEResponseTimeout). Wrap get-or-create with withGetSessVar (bracketOnError) at the call sites, so the cleanup is established where the var is created and covers the whole connect: on interrupt before fill the still-empty var is dropped and the next request reconnects. This closes the window between getSessVar and the fill that a handler installed inside the connect function cannot cover.
UtilTests: tryAllErrors rethrows ThreadKilled/StackOverflow (the mechanism that skips putTMVar). SMPProxyTests: agent client reconnection after a cancelled connect, plus a control proving the stalling relay alone does not cause the failure; refine the relay reconnection tests.
1ddaee7 to
4642b2b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
An SMP proxy permanently stops reconnecting to a destination relay after the relay restarts. The logs show repeated
PCEResponseTimeoutfor that relay, and only restarting the proxy server recovers it.Cause
A
PRXYrequest makes the proxy open a connection to the relay in a worker forked from the sender's client. The worker inserts an empty session var intosmpClientsand then blocks in the connection/handshake. If the sender disconnects while that connect is in flight, the worker is killed by an async exception before the session var is ever filled.Nothing removes an empty session var, so every later request to that relay waits on it until the connection timeout and fails with
PROXY (BROKER TIMEOUT)- forever, even once the relay is healthy again.